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Abstract 

We present an optimization framework for solving multi-agent nonlinear programs subject to in¬ 
equality constraints while keeping the agents’ state trajectories private. Each agent has an objective 
function depending only upon its own state and the agents are collectively subject to global constraints. 
The agents do not directly communicate with each other but instead route messages through a trusted 
cloud computer. The cloud computer adds noise to data being sent to the agents in accordance with the 
framework of differential privacy in order to keep each agent’s state trajectory private from all other 
agents and any eavesdroppers. This private problem can be viewed as a stochastic variational inequality 
and is solved using a projection-based method for solving variational inequalities that resembles a noisy 
primal-dual gradient algorithm. Convergence of the optimization algorithm in the presence of noise 
is proven and a quantifiable trade-off between privacy and convergence is extracted from this proof. 
Simulation results are provided that demonstrate numerical convergence for both e-differential privacy 
and (e, (5)-differential privacy. 


I. Introduction 

Optimization problems spread across teams of agents arise naturally in several fields, including 
communications mi, m, robotics 0, 01, machine learning 0, sensor networks J6l, [|7j, and 
smart power grids 0, 0, ifTOil . Correspondingly, a variety of approaches have been developed 
that solve problems with a wide variety of formulations. For example, ETIl allows for distributed 
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optimization of non-differentiable objectives with time-varying communication links, lH2l con¬ 
siders a similar problem formulation in which communication links fail over time, and [H3tt uses 
a distributed Newton method to solve dynamic network utility maximization problems. Many 
other problem types and solution schemes exist in the literature, and a broad exposition of results 
can be found in lfT4l . 

In some cases multi-agent optimization is done using sensitive user data. A concrete example 
of such a case comes from smart power grids. In smart power grids, homeowners share their 
power usage information with others on the grid to allow network management (e.g., frequency 
regulation IflOl ) and to minimize their own power costs. In some cases, the granular power usage 
data shared in smart grids can be used to infer sensitive details of users’ personal lives lfl5l . 
m. In particular, smart grid data can “provide a detailed breakdown of energy usage over a 
long period of time, which can show patterns of use,” [fl6l Page 15, Item 16]. Further, given 
these patterns, “[profiles can thus be developed and then applied back to individual households 
and individual members of these households,” [161 Page 15, Item 18]. These usage patterns in 
turn ’’could reveal personal details about the lives of consumers, such as their daily schedules,” 
[15, Page 2, Paragraph 5]. 

It is precisely the deduction of such patterns that we wish to prevent in the context of multi¬ 
agent optimization. Based on the potentially revealing nature of some user data, we seek to 
optimize while protecting sensitive user data both from eavesdroppers and other agents in the 
network. In some sense, privacy and optimization are competing objectives in that agents who 
only seek to optimize may freely share their states with others in the network, while agents 
concerned only with privacy may be inclined to share no information at all. To privately optimize, 
then, we must strike a balance between these two different, competing objectives. 

One approach to privacy that has recently seen widespread use is differential privacy. Differ¬ 
ential privacy was originally established in the database literature and keeps sensitive database 
entries private when a database is queried by adding noise to the result of that query ifTTIl . lfT8l . 
The authors of lfl9l . |[20l survey some of the important developments in this vein. Differential 
privacy has been adapted to dynamical systems in order to keep sensitive inputs private from 


an adversary observing a system’s outputs 0T1 . A dynamical system is differentially private if 
inputs that are close in the input space produce outputs that have similar probability distributions; 


these notions will be made precise in Section III 


It is the dynamical systems notion of differential privacy that we apply to keep agents’ state 
trajectories private while optimizing. One appealing aspect of differential privacy is its resilience 
to post-processing, which allows for arbitrary processing of private information without the threat 
of its privacy guarantees being weakened Oil Theorem 1], Differential privacy is also robust 
to arbitrary side information, meaning that an adversary cannot weaken differential privacy by 
much through using information gleaned from another source If22l . 

There has already been some work on enforcing differential privacy in optimization. In 
ll23l linear programs are solved in a framework that allows for keeping objective functions 
or constraints private. The authors of Q4l consider a similar setting wherein linearly constrained 
problems with affine objectives are solved while keeping the objective functions private. In the 
multi-agent setting, [[251 solves distributed consensus-type problems while keeping the agents’ 
objective functions private, while 06l solves similar problems while keeping each agent’s initial 
state private. 

In this paper we solve non-linear programs wherein each agent’s state trajectory is sensitive 
information and the agents therefore seek to protect their exact state trajectories from other agents 
and any eavesdroppers. To protect these sensitive data, a trusted cloud computer is used that 
performs certain computations upon information it receives from the agents, makes the results 
of those computations private by adding noise to them, and then sends the private results to each 
agent. Each agent then updates its state locally using the information it received from the cloud, 
and this process of sharing and updating information is repeated. 

Our motivation for developing a mixed centralized/decentralized algorithm is inspired by the 
prominence of cloud computing in many real-world applications. A survey of existing cloud 
applications is given in 071 . and that reference elaborates on the scalability of the cloud and its 
ability to coordinate many mobile devices. It is precisely these features of the cloud that make it 
an attractive choice here. In this paper, the cloud, viewed as a central aggregator, is an integral 



part of the optimization process, and we leverage its scalability to aggregate ensemble-level 
information, perform computations upon that data in a private manner, and then distribute these 
private results to the agents. 

The privacy implementation in this paper differs from the aforementioned references on private 
optimization in several key ways. We are interested in solving problems in which the agents 
collectively run an on-line optimization algorithm collaboratively by sharing (private functions of) 
sensitive information. In the problems we consider, each iteration of the optimization algorithm 
determines each agent’s next state. That is, the iterates of the optimization algorithm are the 
agents’ states, and it is each agent’s desire to keep its state trajectory private to protect information 
about its behavior. Accordingly, while the above references on private optimization keep other 
problem data private, here we must keep entire trajectories of states private while optimizing. 
In addition, we incorporate both nonlinear inequality constraints and set constraints, which, to 
our knowledge, has not been explored in other privacy implementations. 

Given the need to optimize while remaining private, encryption alone cannot provide the 
privacy guarantees that are needed in the problems we examine. In the “upstream” direction, 
encryption could be used to protect communications sent from the agents to the cloud, provided 
the cloud could decrypt them. However in the “downstream” direction, when the cloud sends 
transmissions to the agents, any encrypted messages from the cloud would naturally need to be 
decrypted by the agents to allow each agent to update its state. While this strategy can protect 
transmissions of sensitive data from eavesdroppers, having the agents decrypt transmissions from 
the cloud would expose all agents’ sensitive data to each agent in the network, violating the 
privacy guarantees that are required by each agent. Instead, what is required here is a privacy 
implementation that protects user data from eavesdroppers and all others in the network, while 
still making that data useful for optimizing. It is for this reason that we use differential privacy. 

A preliminary version of this work appeared in [1281 . The current paper adds a proof of 
convergence, a convergence estimate, quantifies the privacy-convergence trade-off, and provides 
new numerical results for two different privacy mechanisms. The rest of the paper is organized 
as follows. Section II lays out the problem to be solved and its method of solution. Next, 


Section III covers the necessary elements of differential privacy and relates them to the setting 
of optimization. Then Section IV provides a proof of convergence for the optimization algorithm 
used here and a bound on its convergence, in addition to exploring the trade-off between 
privacy and convergence. Next, Section V provides simulation results to support the theoretical 
developments made. Finally, Section VI concludes the paper. 


II. Optimization Problem Formulation 


In this section we lay out the problem to be solved. First in Section II-A we lay out the multi¬ 
agent problem and then, to aid in the exposition of its solution method, formulate an equivalent 
ensemble problem. Then the solution to that problem will be discussed and, in Section II-B| will 
be adapted to the cloud-based architecture used here. 


A. Problem Overview 

Consider N agents indexed over the set I := A r }, with agent i having state x t e M"’ 

for some n, <E N. Agent i seeks to minimize the objective function 

fi : -A K, 

where fi depends only upon x t . that is, each agent’s objective function has no dependence upon 
the other agents’ states. Using the notation V,/* := |A, we state the following assumption for 
objective functions. 

Assumption 1: The function fi is C 1 and convex, and V ,f, is Lipschitz with constant L, for 
all iel. A 

Assumption [T] allows for a broad class of functions to be used as objective functions, including 
any C 2 convex function on a compact, convex domain (cf. Assumption [ 2 ] below). Each agent’s 
state is constrained to lie in a given set which we express as 

Xi G Xi C M ni . 


Regarding each set X u we state the following assumption. 






Assumption 2: Each set X t is non-empty, compact, and convex. A 

In particular Assumption [2] admits box constraints which are common in some multi-agent 
problems. 

Now define the ensemble state vector 


/ 

Xi 


\ 


X = 




y xn y 


where n = J2iLi n *- We impose global inequality constraints on the agents by requiring 


g(x) := 


^ 9i(x) ^ 
92(x) 


y Qm{x) J 


< 0 , 


where the above inequality is enforced component-wise, i.e., g :] (x) < 0 for all j 6 J : = 
{1,..., m}. We now state our assumptions on g. 

Assumption 3: The function g : W 1 — >■ M"' is C 1 and convex. In addition, for M"' and M r ' both 
equipped with the same p-norm, the function g Xj := is Lipschitz continuous with constant 
K J p for all j e J with respect to the metric induced by the p-norm. In addition the function g 
is Lipschitz with constant with respect to the same metric. A 

In this paper we focus on the cases of p = 1 and p = 2. Like Assumption |Tj Assumption [3] 
allows for any convex, C 2 functions to be used for constraints whenever Assumption [2] holds. 
We also have the following assumption on g. 

Assumption 4: The constraints satisfy Slater’s condition, namely there exists a point 
such that g(x) < 0. A 

Assumption [4] is commonly enforced in nonlinear programming problems to guarantee that 
strong duality holds. Under Assumptions |Tj|4j we state an ensemble-level optimization problem. 






To do so, we define the ensemble objective 

n 

f( x ) = E/«(*), 

i =1 

and the set 

n 

v n x 

2=1 

where the product is meant in the Cartesian sense. To fix ideas, we state the following opti¬ 
mization problem that does not yet incorporate privacy; privacy will be formally included in 
Problem Q] in Section HID 

Problem 0.1: (Preliminary; no privacy requirement) 

minimize f(x) 
subject to g(x) <0 
x e X. 


0 


We note here that Problem 0.1 will be solved without having agent i share f t or X, with the 
other agents or with the cloud because these data are considered sensitive information. Similarly, 
g is considered sensitive and the cloud does not share g with any of the agents. The Lagrangian 
associated with Problem lO.ll is 


L{x,p) = f(x) + p T g(x), 

where p is a vector of Kuhn-Tucker multipliers in the non-negative orthant of M m , denoted 
Under Assumptions |T] [2} and [3] a primal solution x exists and the set of all primal solutions is 
non-empty and compact. With the addition of Assumption [4| a dual solution p, exists and the 
optimal primal and dual values are equal ll29l Proposition 6.4.3]. 





Under Assumptions 1-4, a point x solves Problem 0.1 if and only if there exists a point 


// e M"' such that (i;, fi) is a saddle point of L, that is, if and only if the point (x. : fi) satisfies 


L(x,n) < L(x,g) < L(x,g) 


( 1 ) 


for all (x. //) elx 


Proposition 6.2.4]. It is as saddle points of L that we seek solutions 


(£, ji) to Problem 0.1 


Toward that end, we next define the symbols 


t — dL t — dL — dg a f — d -f 

• r, ) . l9x • rs ) an( l Jx ■ o > 

ox a/i ox ox 


and define the map 


G 0,/i) = 


L x (x, n) 


-L^x,n) 

In what follows, it is necessary for G to be a Lipschitz mapping. Though the maps f x and g x 
are Lipschitz by Assumptions |T] and [3j G itself cannot be shown to be Lipschitz because its 
domain, X x M™, is unbounded by virtue of M” 1 being unbounded. To rectify this situation, we 
use Equation ([!]) to find a non-empty, convex, compact set containing fi as was done in |[30l . 
From the second inequality in Equation (Q]) we have 


f(x) + g T g(x) < f(x) + g T g(x) 

for x the Slater point as defined in Assumption |4j By the complementary slackness condition 
we have 

f(x) < f[x) + g T g(x). 


Rearranging we find 




, < 


fix) - fix) < fix) - fix*) 


, mm {-9jix)} mm {- 9j (x)} : 

J — 1 l<j<m x<j<m 






where x* G arg min / (x). We then define the set 

x£X 


■= < /te 


i < 


/O) - /O*) 

mm {-.^(x)} 

l<j<m 


which is non-empty, compact, and convex by definition, and which contains ji. For economy of 
notation, we define the symbols Z := X x M and 5 := (x,p), and we will use z := (x,p) to 
denote an arbitrary point in Z. 

Since L(-, /i) is convex for all // £ M and L(x, •) is concave for all x G A", we see that G is 
monotone PTl Theorem A]. Under Assumptions 1-4, a primal-dual pair (x, jx) is a saddle point 
of L if and only if it solves the following variational inequality (VI) Il32l Corollary 11.1]. 

Problem 0.2: (VI formulation; no privacy requirement) Find a point z £ Z such that 


(.z — z, G(z )) > 0 


for all z e Z. 0 

Further discussion on the equivalence of Problems |0. 1 and |0.2| is given in Il33l . Sections 1.3.1, 
1.3.2, and 1.4.1. Privacy is formally added to Problem 0.2 in the statement of Problem [T] in 
Section |TTl] 

We will use the notation VI(K,F) to denote the generic problem of finding a point x G K 
such that 

(■y — x, F(x)) > 0 


for all y G K , and we will use the notation SO L ( K. F) to denote the solution set of VI(K, F). 
The symbols Z and G refer to the specific problem under consideration in this paper so that 


Problem 0.2 is denoted VI(Z,G) and its solution set is SOL(Z, G). It is in the setting of 


variational inequalities that we will proceed and we focus on solving Problem 0.2 with the 


understanding that its solutions also solve Problem 0.1 


For a compact set K and a monotone map F, one method of solving the variational inequality 
VI(K,F) is using a projection method with an iterative Tikhonov regularization as was done 









for deterministic variational inequalities in l[34l and for stochastic variational inequalities in 11351 ; 
these methods regularize the earlier Goldstein-Levitin-Polyak method for solving such problems 
lt36ll . Il37ll . The basic principle underlying these methods is that a point in SOL(K,F ) can be 
approached iteratively with F specifying the direction in which to move at each iteration. To 
endow this procedure with greater numerical stability and, as will be shown, robustness to noise, 
the k th iteration specified in 041 . 051 instead uses the direction specified by F + a k I with / 
the identity map, a k > 0, and a k —> 0. When F is monotone, each map F + a k I is strongly 
monotone so that SOL(K,F + a k I) is a singleton. Letting £ k denote the (unique) element of 
SOL(K, F + a k I), for o k > 0 and a k —>■ 0 we have (/, —> y 0 where y 0 is the least-norm element 
of SOL(K, F) (which itself is non-empty because K is compact and F is monotone). 

Given an initial point z(0) e Z, the deterministic form of the regularized method to solve 
VI(Z,G) is given below in Algorithm [I] 

Algorithm 1: Given a point z(0) G Z, apply the update law 


z(k + 1) = II z [z(k) - 7 k (G(z(k)) + a k z(k))] 

x(k)~'y k (^f x (x(k))+g x (x(k)) T y(k) + a k x(k) 
h(k) + 7 k (g(x(k)) - a k y(k)) 


=n 


( 2 ) 


until a fixed point 5 is reached. 0 

Here a k is the regularization parameter at timestep k and 7 *. is the step-size at the same 
timestep. In Section III we will use Algorithm [T] to solve a private optimization problem, and in 
Section [IV] we provide hypotheses on z k and a k sufficient for convergence. Currently we show 
the applicability of this style of solution to the cloud architecture mentioned above. 


B. Communications 

If we separate the update law in Algorithm [I] to examine the per-agent (primal) update law, 
we find that agent i executes 


Xi{k +1) = n Xt 


Xi(k) - 7fc(V ifi(xi(k)) + g Xi {x{k)) T y,(k) + a k Xi(k)) 


( 3 ) 







The only terms on the right-hand side of this update law that contain information from other 
agents are g Xi (x(k )) and n(k). Though g Xi is a function of all states in the network, the agents 
do not send their states to each other directly to allow for its computation because doing so 
may reveal sensitive information. Instead, every agent sends its state to a trusted cloud computer 
which computes g Xi (x(k)) for every i e I. Because no agent has every agent’s state value, no 
agent can compute g,(k) (cf. Equation ([2])) and therefore the cloud computes jt(k) as well using 
the update law 

A i{k + 1) = n M [//(&) + 7 k(g(x(k)) - a k fi(k))]. 


Then, to use Algorithm |T| with this architecture, the cloud sends (private forms of) g Xi (x(k) 'j 
and fi(k) to agent i; the modifications to these quantities to make them private are covered 


in Section [nij The cloud is assumed to be a powerful computer capable of carrying out these 
calculations quickly so that they reliably arrive at the agents in a timely fashion. 

With this communications scheme, at timestep k four actions occur. First, agent i sends 
to the cloud and the cloud assembles all agents’ states into the vector x(k). Second, the cloud 
computes g(k) and g Xi (x(k)) for all i e I in a differentially private way. Third, the cloud 
sends a private form of g Xi (x(k)) T /i(k) to agent i. Fourth, agent i computes x t (k + 1) while 
the cloud simultaneously computes ji(k + 1), and then this sequence of communications and 
computations is repeated. Because this happens at every timestep, information in the network is 
always synchronized when computations occur and there is no disagreement between the agents 
or cloud as to what the value of a particular state is. As a result, the computations that are spread 
across the network in this manner produce identical results to Algorithm |T] and the ensemble 
problem is, mathematically, equivalent to the cloud-based multi-agent problem. 

For simplicity, the forthcoming analysis will be carried out in the ensemble setting. Despite 


the mathematical equivalence between the multi-agent and ensemble approaches, the advantage 
of the cloud-based approach in practice is that it allows for each agent’s state trajectory to be 
kept private while the ensemble approach does not. 



III. Private Optimization 


Differential privacy originates in the database literature in computer science and was originally 
designed to keep individual entries of a database private [[2011 . It has recently been extended to 
the setting of dynamical systems in ll2Tft . Differential privacy offers a formal definition of privacy 
as well as resilience to post-processing and robustness to side information. This resilience to 
post-processing prevents an adversary from weakening the guarantees of differential privacy 
by performing post-hoc calculations on private information. Robustness to side information 
guarantees that an adversary cannot use information it has gleaned from an alternate source 
to fully defeat differential privacy. Below we first review differential privacy, then give a formal 
private optimization problem statement, and finally discuss applying privacy to Problem 0.2 

A. Differentially Private Systems 

Let there be N input signals to a system, each contributed by some user. The i th input signal is 
denoted Ui and is contained in the set Iff namely the space of sequences of Sj-vectors equipped 
with the pi norm, with Sj,p* 6 N, such that every finite truncation of u t is in iff More explicitly, 
let Ui(k ) denote the k th element of u, and define 



Ui(k) for k < T 


otherwise. 


Then we say u % 6 ff if and only if Pru, has finite yy-norm for all values of T. Using this 
definition, the full input space to the system is 


N 



1=1 


where the product is meant in the Cartesian sense, and the system produces outputs in 




In this paper we consider the cases where p l — 1 for all i & I or p, = 2 for all i £ /. In the 
case of pi — 1, the full input space to the system is and we use the ordinary 1-norm on this 
space. For p t = 2, we likewise use the ordinary 2-norm on While each of || • ||i and || ■ || 2 
will be used for both the 1-norm and 2-norm on M n and I s , the intent of each symbol can be 
discerned from its argument each time it is used. 

To implement differential privacy, we must specify which inputs we wish to generate “similar” 
outputs. To do this, fix a real number B > 0 and define the binary symmetric adjacency relation 
Adj B : tp x £ s p -> {0,1} as 


Adj B (u,u) = 1 -vv- ||w — u ||p < B. (4) 

Two inputs u and u for which Adjfu, u) = 1 are called “adjacent.” 

Towards making precise the notion of “similar” outputs, fix a probability space (f2, T. P) and 
let SS d denote the Borel tx-algebra on W [ . Differential privacy is enforced by a mechanism, 
which is a map M taking the form 


M T* x fi 4 r q , 

and the role of a mechanism is to approximate a system whose inputs are sensitive information. 
We now state the definition of a differentially private mechanism. In this definition, we use a 
cr-algebra over t q , denoted E f; r J^| 

Definition 1: A mechanism M : i s v x Q —> £ r q is (e, 5)-differentially private if and only if, for 
all adjacent u,u £ s p we have 

P (M(u) eS)< e e P (M(u) E S) + 5 (5) 

for all S G E gjr . 0 

In Equation ([5]) it is e and <5 that determine the privacy policy and smaller values of each 
imply a greater level of privacy for users. In general e should be kept small and typical values 

1 An explicit construction of this <r-algebra can be found in ED Section III-A], though we avoid a lengthy exposition on E q?r 
due to the relatively minor role its technical details play in the current work. 


for e range from 0.1 to In 3. On the other hand, 5 should be kept as small as possible because it 
allows for zero probability events for M (u) to have non-zero probability for M (u) and therefore 
can allow for important losses in privacy by making it easy for an adversary to distinguish 
between outputs. Common values for <5 range from 0 to 0.05; (e, 0)-differential privacy is called 
e-differential privacy and, in general, e-differential privacy is stronger than (e, 5)-differential 
privacy precisely because of the aforementioned losses in privacy that can come from S > 0. For 
this reason, (e, £)-differential privacy can be regarded as a ^-approximate form of e-differential 
privacy ll38l . For a fixed value of e, the benefit of using even small values of S > 0 is that the 
variance of noise added can be reduced while maintaining “almost” the same level of privacy. 


B. Private Optimization Problem Statement 


In the setting of Problem 0.2, we want to protect the state trajectory, x = (.'r(7;:)) / r , P which 


is a sensitive signal in f®, and in so doing we protect each individual agent’s state trajectory; for 
agent i this is Xi 6 As discussed in Section nj keeping individual agents’ state trajectories 
private is necessary when the cloud computes g Xi and // at each time k. To implement privacy 
in these computations, we regard each g x , as a deterministic, causal, memory-less dynamical 
system and seek to make each such system differentially private. Similarly, we regard g as a 
deterministic, causal, memory-less dynamical system and seek to make it differentially private 
as well. Due to the post-processing property of differential privacy, computing // using a private 
form of g also implies that /; keeps each x, private. 

As discussed in Section [TTJ the agents do not communicate with each other at all and, 
instead, each agent only sends its state to the cloud. The cloud handles all required centralized 
computations and sends (privatized forms of) their results to the agents. Denoting by g Xi and g 
the private forms of g x . and g, respectively, at time k the cloud sends to agent i the vector 


Pi(k) = g Xi (x(k)) T g(x(k)). 


We are interested in having a team of agents optimize by having the cloud send agent i only 
pi(k) at time k. We require that fti(k) protect Xi for all i G /, and we implement privacy by 




approximating g x . (for all i £ /) and g by differentially private mechanisms. Using this method of 
communications, we state the following problem that incorporates both optimization and privacy 
objectives, and respects the fact that the objectives and constraints in this problem are sensitive 
data. 

Problem 1: (Private optimization ) Solve Problem 0.2 using Algorithm [I] while 

i. the agents communicate only with the cloud (i.e., there is no inter-agent communication) 

ii. the cloud makes the systems g and g Xi , i e / (whose inputs are the agents’ state trajectories) 
differentially private in the sense of Definition |T| 

iii. agent i does not share f , or X, with any other agent or the cloud 

iv. the cloud does not share g with any agent. 0 

Towards solving Problem [Tj we now review mechanisms which implement differential privacy 

for dynamical systems. 


C. Privacy-Preserving Mechanisms 

To define a mechanism for enforcing differential privacy, we must first also define the sen¬ 
sitivity of a system, which is used to determine the variance of noise that must be added in 
a privacy-preserving mechanism. Letting Q be a deterministic causal system, the sensitivity of 
Q is an upper bound on the distance between Q{u) and Q{u) whenever Adj B (u,u) = 1 holds. 
Formally we define the i p sensitivity of Q, denoted A p Q, as 

A P G ■= sup \\Q{u) - g(u)\\ p . 

u,u:Ad] B (u,u)=l 

The mechanism we will use for e-differential privacy is the Laplace mechanism, which adds 
noise drawn from a Laplace distribution. Below we use the notation Lap(//, h) to denote the 
Laplace distribution with mean // and scale parameter b. 

Theorem 1: dl2T[ Theorem 4]) Let the adjacency relation defined in Equation ([4]) be used with 
p — 1 and let Q be a system with sensitivity Ai Q. Let a constant e > 0 be given and recall that 



r is the dimension of the output space. Then the mechanism 


M (x) = Q{x) + w 


where w(k) ~ Lap(0,6/e) r and b > Ai Q is e-differentially private. ■ 

For (e, 5)-differential privacy, we will use the Gaussian mechanism. Its definition requires that 
we first define n(5, e) using the Q-function, 


Q(y) ■= 


s/2n 


e 2 dv. 


The function k(5, e) is defined for e > 0 and 0 < 5 < | as 

e ) := y + \/ K s + 2e), 


where Kg = Q '(f)). We now define the Gaussian mechanism. 

Theorem 2: ([|211 Theorem 3]) Let the adjacency relation defined in Equation (|4]) be used with 
p — 2 and let Q be a system with sensitivity A 2 Q, with constants e > 0 and 0 < 5 < | given 
and r the dimension of the output space. Then the mechanism 


M(x) = Q(x) + w 

where w(k) ~ A/"(0, o 2 I r ) is (e, 5)-differentially private for a > k(5, e) A 2 Q. ■ 

Theorems |T| and [2] provide a lower bound on the variance of each noise that is added and we 
assume that these variances are also chosen to be finite. We now compute the sensitivities that 
are needed to implement differential privacy in Problem [I] 


D. Computing Sensitivities 

In Problem |T] it is desired to protect the value of x, including from agents in the network. 
In the per-agent update law in Equation ([3]), xik) appears in g Xi and g Xi must therefore be 
made private before the cloud sends g x .{x{k)) to agent i. To protect x in this way, the cloud 
adds noise directly to g Xi (x{k)'), and the variance of noise that must be added depends on the 




sensitivity of g Xi . To compute the sensitivity of g Xi we regard it as a memory less dynamical 
system and generalize it to act on entire signals of states. Recalling that x(k) G X for all k, 


under Assumption 2| X is bounded and therefore x(k) is as well for all k G N. Then x G E p . 


We now overload the notation g x . by allowing it to act on elements of £ p . In particular, g x . 
acts on elements of W 1 as before and for state trajectories x G £ p we define 


g x .(x) := (g x .(x(k))) keN . 


We now fix a real scalar B > 0. For two state trajectories, x,x G £ p such that Adj B (a?, x) — 1 
holds, we compute the sensitivity of g Xi according to 

A P g Xi = sup \\g Xi (x) - g Xi {x)\\ p 


x,x:Ad) B (x,x)=l 


oo 



< sup 

x,x:Ad) B ( x,x)=l 


where we have used ||as — x\\ p < B and where this bound on the sensitivity holds for g x . for 


all i G /. 


In computing g{k), the cloud must also add noise in some fashion because g{k) depends upon 
x(k). We regard g as a dynamical system and make it private, and the resilience of differential 
privacy to post-processing guarantees that /i = (//(A;)) keeps x private. To compute the 
sensitivity of g we extend it to act on x G as above. For x. x G satisfying Adj s (cc, x) — 1 
we use the same procedure as was used above for g Xi to find 


A p g = sup 


\\g(x) - g(x)\\ p < K 9 p B. 


Having computed the requisite sensitivities, we return to solving Problem [I] 


E. Optimizing in the Presence of Noise 

We now examine how noise appears in Algorithm |T| once it has been added for privacy. For 
g Xi {x(k)) we add noise wfk) G M mxni drawn from either a Laplace or Gaussian distribution 





and for g{x{k )) we add noise w g (k ) G M m drawn from the same class of distribution as the w i7 
with all noises independent. Define w x by 

w x = (wi w 2 ■■■ w n ) G M mxn . 


In ensemble form the private dynamics under consideration are 


z(k + 1) = 


= II; 


Expanding, we find 


x(k + 1) 
n(k + 1) 


z(k) - 7 fc 


fx{x(k )) + (|f (x(k)) + w x {k)) T /i{k) +a k x(k ) 
-g(x(k)) + w g {k) + a k n(k) 


z(k + 1) = 


x(k + 1) 

n(k + 1 ) 


( 6 ) 


= n ; 


- ik 


fx{x{k )) + ^(x{k)) T n{k) + w x (k) T n(k) + a k x(k) 
-g(x(k)) + rug(fc) + a k g{k) 


Because fi{k) G M™, each element of w x {k) T fi{k) is some weighted combination of elements 
of w x (k) with non-negative weights. Combined with the independence of the noises used for 
privacy, this results in each entry of w x (k) T g(k) being a random variable having variance that 
is the weighted sum of variances of elements of w x (k). With this in mind we define the random 
vector w s (k) = w x (k) T fi(k) (which we note has finite variance since n(k) is contained in M 
and w x (k ) has finite variance), and zero mean (because v'+ik) has zero mean for all k G N and 
all i G I). Then we can rewrite Equation ([6]) as 


z[k +1) = n z 


z(k) - 7fc 


f x (x(k )) + (x(k)) T n(k) + w s (k) + a k x(k ) 

-g(x(k)) + w g (k) + a k n(k) 


= n z [z(k) - 7 k (G(z(k)) +a k z(k ) +w(k))], 


where w(k) denotes the noise added at timestep k and aggregates all noisy signals used for 








privacy. We state this stochastic update law as Algorithm [2} 

Algorithm 2: Given z(0) G Z, apply the update law 

z(k + 1 ) = II Z [z(k) - 7 k (G(z(k)) + a k z(k) + w(k))] 

until a fixed point z E Z is reached. 0 

We note that by its definition E[w(A:)] = 0, and observe that this noisy update law is equivalent 
to Algorithm [T| with an additional noise term added. 

IV. Convergence of Private Optimization 

In this section we prove the convergence of Algorithm |2j Algorithm [2] was first presented in 
|f34| without noise and was presented in its noisy form in lf35ll . Both papers omit proofs and, due 
to the heavy dependence of this work upon Algorithm [2] we provide a proof here. To the best of 
our knowledge a proof of the convergence of Algorithm [2] as stated in ll35ll is not available in the 
literature; similar work is presented in |[39l . Phi which cover algorithms related to Algorithm [2j 
though those works impose additional assumptions upon a k and due to the differences in the 
problems studied in those works. 

A. Main Convergence Result 

Now we explore in depth solving variational inequalities using a Tikhonov regularized pro¬ 
jection method, the basic elements of which are covered in lf33l Section 12.2]. Earlier it was 
stated that if S()L(K. F ) 0, then for e SOIJK. F + a k I) we have £/, —> z 0 where z 0 is 

the least-norm element of SOL(K, F). Using that {£/,-} k cn is a convergent sequence, we find 
that {||& ||}fcgN is bounded and, in particular, there is some such that \\^ k \\ < 2 A for all k, 
e-g-, Uk\\ < sup zeZ ||z||. 

Using this fact, the following lemma relates points z(k) generated by Algorithm [2] to successive 
solutions to the problems VI(Z, G + a k I ) (each with a k held constant). Recalling that £/, is the 
unique solution to VI(Z,G + ct k I), we have the following result. 


Lemma 1: For all k e N 


2 ( Oik-1 — 0(k\ ( 1 + 7 kOik 


\z{k) - 6|| < (! + lkOik)\\z(k) - 6-i||“ + M e 

V °i k J \ 7 kOik 

Proof: First note that because 6 solves VI(K, G + a/,/), we have 


(6-i — 6) T (G ! (6) + a fe6) > o. 


(7) 


Similarly for 6-i we find 


(6 — 6-i) r (G(6-i) + «fc-i6-i) > o. 


( 8 ) 


Summing Equations ([7]) and ([8]), and using the monotonicity of G gives 


(6—i 6) i^oik 6 oi}~— 16—i) — O’ 


Adding and subtracting a^k-i inside the second set of parentheses then gives 


(6—i 6) (®fc7 oik— 1-7)6—i 


> a*(6-i — 6) T (6-i — 6) = a*||6-i — 611" 


Using the Cauchy-Schwarz inequality results in 


I16-i - 611 < Qfc - 1 - Qfc Mg 

«fc 


(9) 


Expanding the term | \z[k) + 6-i — 6-i — 6II 2 and applying Equation ([9]) then gives 


I z{k) -611 < II z{k) — 6-i|| + 


2 . ( Oik-1 Ol k \ n/r 2 


Oik 


Mi 


2| afc| )M i \\z{k) — 6-i||- (10) 

Ot-k 







For the third term on the right-hand side above we have 


\ / 

= 2 y /') k a k \\z{k) -gfc-i|| 

V OLkyJlkOLk ) 

< "YkCVk\\z(k) - 6-i|| 2 + M| ^ Q!fc ~ 1 ak ^ Y (11) 

V 7fc«fc ) 

where we have used that fact that a 2 + b 2 > 2 ab for a, b G M. 

Substituting Equation ( [TT| ) into Equation ( fTO] ) gives the desired result. ■ 

The other lemma we need concerns the convergence of sequences of random variables and 
enables a Lyapunov-like argument to be made for their convergence. 

Lemma 2: m, Lemma 10, Page 49) Let v 0 ,..., v k be a sequence of independent random 
variables with v k > 0 and E|/; 0 ] < 00 . Suppose that 


Efafc+i] < (1 — T k )v k + O k , 


with 

OO 

0 < r fc < 1, o k > 0, ^ r k = 00 , — -> 0. 

k =0 

Then E [v k \ —> 0. If, in addition, we have 


then v k —> 0 almost surely and 


OO 

y o k < 00 , 

/c=0 


P(dj < e for all j > k) > 1 


E[u fc ]+y a* 


i=k 


We now prove the convergence of Algorithm [2j 

Theorem 3: Let Assumptions 1-4 hold. Suppose that 7 /, > 0 and o/. > 0 satisfy the following 


four conditions: 








Then for noise signal w with E [w(k)] = 0 and bounded variance for all k <E N, for the update 
rule 

z(k + 1) = II z [z(k) - 7 k (G(z(k)) + a k z(k ) + w(k))] 


we have E[||z(/c) — ^o|| 2 ] —> 0, where zq is the least-norm solution to Problem 0.2 
Let L g be the Lipschitz constant of G. If, in addition to the above, the sequence of terms 




1 - 7 ^2 - 7 fc a fc - ~^g ~ ^IkLc^J ^ • 


Olk -1 ~ a k\~( 1 + IkOlk 


&k 


Ik^k 

+ 7feE[||w(fc)|| 2 ] 


is summable, then the convergence estimate 


z(j) - ^_i|| 2 < e for all j > k) > 1 - - ( E[||z(fc) - £ fc _i|| 2 ] + 


( 12 ) 


i=k 


holds. 


Proof: It was established in Section [5-A that S()L(Z. G) f 0 so that —* z 0 where z 0 is the 
least-norm element of SOL(Z,G) and where £& solves VI(Z,G + a/,./). We now show that 
z(k + 1 ) —> 

Because £ k solves VI{Z, G + a^I) we have 


6c — n z [6c — 7k(G(£k) + 0^6:)] • 


Using the non-expansive property of the projection operator and taking the expectation of both 









sides we find 


E[|Kfc + l)-£ fc || 2 ] = 


E 


II; 


z{k) - 7 k (G(z(k)) + a k z(k) + w(k)) 


n ; 


a - 7 k{G(£k) + cxktk) 


< E 


\z[k) - a + 7 k{G(£k) ~ G(z(k ))) - 7 k Oi k {z(k) - £ k ) - j k w(k)\\‘ 


< || z(k) - ail 2 - 27 *a fc ||*(A 0 - af + 7 fc 2 ||G(a) - G(z(k ))|| 2 
+ 27fcQ!fc(G(a) - G{z(k ))) T (a - *(*)) + lW k \\i k - ^(fc)|| 2 + 7^E[||w(fc)|| 2 ] 


where the last inequality follows from the monotonicity of G, and where the fact that E[w(&)] = 0 
has caused all terms containing w(k) except E[||w(fc)|| 2 ] to vanish. 

Using the Cauchy-Schwarz inequality then gives 


E[| \z(k + 1) - all 2 ] < I \z(k) - all 2 - 2 ^ k a k \\z(k) - an 2 

+ 7fc 2 ||G(a) - G(*(*))H 2 + 27g«*||G?(a) - G(z(k)) ||||a - Z(k) || 

+ 7 2 a 2 ||^ - z(k) II 2 + 7 fcE[||w(/c)|| 2 ]. 

Assumptions [T||3] and the compactness of M together imply that G is Lipschitz and, denoting 
its Lipschitz constant by L G , we have 


E[||z(*: + i)-aii 2 ] < 

(l - 2 7fc«fc + llL 2 G + 2y \a k L G + 7 lal^j || z(k) - a|| 2 + 7 2 E[||w(fc)|| 2 ]. 


Defining 


a ; — 


1 


'yk^k 


^2 - 7 k a k - 


—L 2 g - 2 lk L G ) 
OLk J 


and 


Pk := M i 


2 ( a k -1 _ Oik 


Ol k 


1 + 7 kOlk \ 
IkOik ) ’ 
















applying Lemma |T| then gives 


E[||*(fc + 1) - 6II 2 ] < O k { 1 + 7fc a fc )|| z{k) - a-i|| 2 + O kPk + 7 2 E[|Hfc)|| 2 ]. (13) 


By hypothesis we have 


7 k&k —■> 0,->■ 0, 7fc —>• 0, 


with a k > 0 and 77,. > 0 for all k. Then there exists an M > 0 such that for all k > M we have 


7 kOt k e ( 0 , 1 ) and 0 < 1 - 27 k a k <6 k < 1 - 7 ^ 07 . 


Then for all k > M 


0k{ 1 + 7< Qk + 7fc«fc 


and thus for all k > M 


Q k { 1 + 7fc°fc) < 1 — 7fc Q r(l — 7fc a fc — —^g — 2'-f k L G ) € ( 0 , 1 ). 

Oik 


In particular, take some 6 e (0,1) so that 


1 - 7( 1 - 7fe«fc - — l g ~ ^lkL G ) < 1 - 7 k Oi k 0 

\ J 

for all k > M. Then 1 — 7 k ot k 9 € (0,1). Setting r k = 7 k a k 9 and cr k = p k 9 k + 7 fE[||w(/c)|| 2 ] we 

rewrite Equation ([13]) as 


E[||«(A: + 1) - all 2 ] < (1 - n)\\z(k) - a-i|| 2 + crfc- (14) 


All that remains is to show that the conditions of Lemma [ 2 ] are met. First, r k 6 (0,1) by 
construction. For all k > M we have p k > 0 and 0 k > 0 so that a k > 0. Regarding summability 
of 77 we find 

OO OO 

Tk = d = 00 
k=M k=M 


by hypothesis. To show that 07 / 77 , —>• 0 we have 

a, ^(^) 2 ( 1 Sf)^ 2 + 7P[lkWII 2 ] 

7~k 

= (1 - 2 7 k a k + 7 la\ + 7 'I l g + 2aalL G )M\ 

1 It* ^ 

V 7 kOi k J 0 6 a k 

Using the hypotheses regarding 77 and a k we have 

1 — 27 k a k + 7 k a 2 k + 'jILq + 2a k %Lc —> 1, 1 + 7 fc a fc —>■ 1, 


along with 


Oik -1 — oi k 
IkOil 


—)■ 0 , 


so that the first term in — goes to zero. It was established in Section 


III-E 


that E[||w(/c)|| 2 ] is 

bounded above for all k, namely that E[||t(;(fc)|| 2 ] < K w for some K w > 0. Because ^ —> 0 we 
have —K w —> 0 and hence — —> 0 as desired and the first part of the theorem follows from 

’ 7 "k 

Lemma |2l 

When the sequence {cr fc } fceN is summable, the additional convergence estimate is a straight¬ 
forward application of Lemma [2] as well. ■ 

One valid choice of 7 ^ and a k satisfying conditions 1-4 in Theorem [3] is 


a = ak Cl and 7 = 7 k 02 


with a > 0 , 7 > 0 , 0 < ci < c 2 , and ci + c 2 < 1 


B. Convergence Rate Estimates 

Lor the above choice of step-size, we derive bounds on c\ and c 2 which are sufficient to make 
07 summable. As shown in the proof of Theorem [3] there exists an M > 0 such that for all 








k > M we have 6 k e (0,1), so that for all k > M we have 


a k < M i 


2 ( 1 Ol k 




i±^) +7 , 2 E[|| 

7 / 




mil 2 ]. 


To make the second term in Equation ( f]~5| ) summable, we can set 


(15) 


rs, b> C2 


7 k = 


with c 2 > §. Again using K w to denote an upper bound on the variance of w[k) gives 


X)7 I 2 E||k(*:)l| 2 ]<f/«(2c 2 ), 


k =1 


where ('(•) is the Riemann zeta function [l42l defined as 




n=l 


which takes finite values for arguments p > 1. 


( 16 ) 


Regarding the first term in Equation (|T3j), we note that there is some M > 0 such that 

1 

1 < 


a'yk Cl k 02 


for all k > M and therefore 

( a k- i ~ Oi k \ 2 k 1 + 7 kCn k \ < 2 ( a k-i — Oi k \ 2 1 

V / V 7fc«fc /V a k ) /c _(ci+C2) 

for all such k. Substituting a k = ak~ Cl and expanding the squared term gives 


/ Oik-1 — Oik\ 

2 i 


V J 

a r yk~^ ci+c 2 ) o;7/c _ ( cl+C2 ) 


(17) 


To approximate the terms containing 1/k we use a (truncated) power series expansion, namely 
that for x 6 (—1,1) 


(1 — x) r ~ 1 + rx H— r(r + l)a: 2 H— r(r + l)(r + 2)x 3 . 

2 6 


( 18 ) 















Applying Equation ( fT 8 | ) to Equation ( [XT] ) gives 

'(i-i)" 2ci -2(i-iP + i 


a r fk-( cl+C2 ') 

We see that sums of such terms are given by 


2(cf + c?) _ 
orj /c 2 -( Ci + c 2) 1 ay/c 3_ ( c i+ C2 ) 


2 c? 


+ 


£2 

k =1 




Q/y/c“( Cl + c 2 ) 


9/^2 9(-4- 

^4C(2 - ( Cl + c 2 )) + 1 *_ + 1 C(3 - (d + c 2 )). (19) 
ay ay 


Returning to Equation ( |~i3j ) and using the results of Equations < [T6l ) and ( p~9] ) gives 

°° 9r 2 O ( r 3 1 r 2\ 

< 7 2 A£C(2c 2 ) + —z-C(2 - (d + c 2 )) + 1 ' 7 - lJ C(3 - (d + c 2 )). 

^ «7 o;7 


( 20 ) 


Due to the approximations made and ranges of k considered in bounding this sum, we can 
only guarantee that the convergence estimate relying on Y^kLi a k will hold for k > max{M, M}. 
However, for k < M we will often have 07 < 0 (as when La is large) and thus we expect the 


bound in Equation ( |20| ) to hold for a range of values of k < M because negative terms with 
such indices have been over-estimated by including positive terms at such indices in Equation 


fl20| ). In addition, we expect a and 7 to be small enough that M will often be small, e.g., less 
than 10 , thus allowing this bound to hold over a wide range of values of k. 

To apply the bound in Equation (JT 2 ]), we also need to estimate the term E[||^(fc) — £ fc _i|| 2 ]. 
Returning to Equation ([T4]) and taking the expectation of both sides one timestep earlier gives 


E[||z(fc) - 6 c-i|| 2 ] < (1 - Tjfe_i)E[||z(fc - 1 ) - 6 .- 2 II 2 ] + CTfc- 1 , 


( 21 ) 


which is a (time-varying) affine recurrence relation in the expected error in the optimization 
algorithm. Solving Equation ( |2T| ) (see e.g., Il43l . Section 2.1.1.2), we find that 
















Defining the diameter of the set Z via D z = sup ZlZ2GZ \\zi — z 2 \\, we can bound the initial error 
via E[||z(l) — £ 0 || 2 ] < giving 


'k -1 


k -1 


E[|K*0 - &- 1 II 2 ] < Ii (1 - T ”) + E 


(Jn 


\n =0 




( 22 ) 


In Equation ( [T2| ), one can compute the sum Y^L k by us ing the analytic bound for a k 


given in Equation ( f20| ) and subtracting the first k — 1 values of a, from this value. Combined 
with the upper bound on E[||z(A;) — ^-i|| 2 ] given in Equation ( [22] ), one can then use the bound 
in Equation ( [72] ) to determine the probability with which the error in the optimization algorithm 
stays within some bound for all time. Having explored convergence in the presence of privacy, 
we now examine the trade-off between the two competing objectives of privacy and convergence. 


C. The Trade-off Between Privacy and Convergence 

In this section we derive a quantifiable trade-off between privacy and convergence, and for 
concreteness we focus on the case of e-differential privacy, though a similar trade-off can be 
derived for (e, 5 )-differential privacy. 

Returning to Equation ( [13] ) we find the inequality 

E[\\z(k + 1) - £ fc || 2 ] < 0 fc ( 1 + a klk )\\z{k) - 6;-i|| 2 + 0 k p k + 7 ^E[||w(A;)|| 2 ], (23) 

where we see that only the term E[||tu(fc)|| 2 ] depends upon the noise added for privacy. Given 
that w(k) has zero mean, we find E[||w(fc)|| 2 ] = var (w(k)). In the case of e-differential privacy 
we have w(k) ~ Lap(0,5/e) r so that 

W 

var [w(k)] = (24) 

e z 

where W := W(Aig, Aig Xi , B) is a constant that depends upon the systems of interest, g and 
g Xi , and the adjacency parameter, B. 





Returning to Equation ([23]) and substituting in Equation (24) we find 


„ on IV 

E[||^(fc + 1) — ^fc|| ] < Ok(l + Q!fe 7 fe) \\z(k) — £fc-l||" + OkPk + 7 fc - 2 ~- 


The additive term 7 is the only term in which the privacy parameter e appears, and this term 
can be regarded as a penalty on convergence because it allows the expected error E[||z(/c + 
1) — £ fc || 2 ] to grow from \\z(k) — £ fc _i||. Viewing this term as a convergence penalty then reveals 
a fundamental trade-off between privacy and convergence: implementing e-differential privacy 
comes at the cost of a convergence penalty proportional to 1/e 2 . We state this trade-off succinctly 
and informally by writing 


Privacy (e) 


Convergence^ 2 ). 


V. Simulation Results 

Below we present numerical simulation results for a system with n = 10 agents and m = 6 
constraints. We simulate both e- and (e, 5)-differential privacy. 

A. Example Problem 

Let there be n = 10 agents, each with state 27 6 K 2 and using ensemble objective function 

f { x ) = ((27,1 - 5) + (27,2 + 5)) + H27II 2 + 

+ ((27,1 — 8 ) + (27,2 — 8 )) + 

+ ((®6,i — 10) + (^6,2 — 10)) + ((2:7,1 + 10) + (2:7,2 + 10)) 



/ 7 \ 

2 

f 0) 

+ 

27 + 

+ (( 2 : 9,1 — 6 ) + 2 : 9 , 2 ) + 

xio - 


V 0 / 

V 8 / 




where 27 j is the j th state of agent i and the per-agent objectives can be discerned in the obvious 
way. The constraints on the agents are 




















( Ikll 2 + INII 2 + INII 2 -10 ^ 

ll^4|| 2 +IN|| 2 +||a; 6 || 2 -50 

||x 7 || 2 + ||a; 8 || 2 + ||a; 9 || 2 -50 
g[x) = ^ < 0. 

x i,i + ®5,i + ®io,i - 50 
x 4,2 + x 7 ,l + x 9,2 ~ 20 

V IM 2 + ini 2 -30 

Each agent was also constrained to lie in the box X t = [—10,10] x [—10,10]. The Lipschitz 
constants of g were computed to be Kf = 39.82 and /if = 56.71. The Lipschitz constants for 
each g Xi are shown in Table [Ij 
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TABLE i 

Values of K \ and K \ for g Xi ,i € {1,..., 10} 


In both simulation runs below, the step-size rule discussed at the end of Section IV was used 
with the values 


a = 0.1, 7 = 0.01, Ci = 0.3, and c 2 = 0.52, 


and all states and Kuhn-Tucker multipliers were initialized to zero, i.e., :/;,;(()) = 0 for all z G / 
and n(0) = 0 . 




















Fig. 1. The values of ||cc(fe) — ®o ||2 for k = 1,..., 100, 000 under e-differential privacy with Algorithm]^ The steady, monotone 
descent toward xo indicates numerical convergence to xo in the presence of noise. 


B. Simulation of e-differential privacy 

The adjacency parameter was chosen to be B = 1. The value e = In 2 was used for all systems. 
The distribution and variance of each entry of each noisy signal were computed and are listed 
in Table [TT] where we use the notation for the PDF of a random scalar with the understanding 
that each entry of the random matrices Wi was generated using such a distribution. 


Noise 

Distribution 

Variance 

Wi 

Lap(0, 5.771) 

66.60 

w 2 

Lap(0, 2.885) 

16.65 

w 3 

Lap(0, 2.885) 

16.65 

U>4 

Lap(0, 2.885) 

16.65 

W 5 

Lap(0, 2.885) 

16.65 

w 6 

Lap(0, 5.771) 

66.60 

W 7 

Lap(0, 2.885) 

16.65 

w 8 

Lap(0, 5.771) 

66.60 

W 9 

Lap(0, 2.885) 

16.65 

WlO 

Lap(0, 2.885) 

16.65 


TABLE II 

Noisy signals and their distributions for c-differential privacy 


The distribution for w g was Lap(0,57.45) with variance 6.600 • 10 3 . Using this problem 
formulation, Algorithm [2] was run for 100, 000 iterations. To show the behavior of the algorithm 
over time, the least-norm saddle point of L, z 0 = (x 0 ,po), was computed ahead of time and 
the values of || x(k) — x 0 \\ 2 and || p(k) — p 0 \\ 2 are shown in Figures |T| and [2} respectively, for 



















Fig. 2. The values of ||/r(fc) — A 10 II 2 for k =f 1,..., 100, 000 under e-differential privacy with Algorithm^ Here we see an 
initial descent followed by a period of oscillations as ^(fc) approaches fj, q. 


1 < k < 100,00CQ 

In Figures [I] and [ 5 ] we see a clear decreasing trend in both \\x(k ) — 1| 2 and \\^{k) — || 2, 

with the primal error appearing to be monotonically decreasing and the dual error oscillating 
while showing a general decreasing trend. The oscillations seen are expected given that the 
variance of the noises added is constant while G decreases in magnitude as the saddle point zq 
is approached. In fact, it is known that descent will be achieved in a gradient method as long 
as the norm of noise added to the gradient is less than the norm of the gradient itself ll44l . In 
light of this fact, the trends seen in Figures |T| and [2] are not surprising because the gradients in 
G in Algorithm [2] will have large norms far from z 0 , thereby allowing them to “overpower” the 
noise added, while close to z$ their norms will be smaller and the noise can dominate, causing 
increases in the distance to z 0 at some timesteps. Of course, z(k) —>■ z 0 asymptotically because 
these increases in \\z(k ) — ; j 0 || 2 average out over very long periods of time. 

The initial error values here were 

||x(0) - X0II2 = 13.19 and ||/x(0) - /i 0 || 2 = 2.169, 


2 Though the 1-norm is used for other aspects of e-differential privacy, we measure distance to zo using the 2-norm to allow 
for meaningful visual comparison of the plots corresponding to e-differential privacy in this subsection to those corresponding 
to (e, (^-differential privacy in the next subsection. 




Fig. 3. The values of ||x(fc) — a?o|| 2 for k = 1,..., 100, 000 under (e, <5)-differential privacy with Algorithm [ 2 ] The rapid 
descent toward xo and clear decreasing trend thereafter indicate numerical convergence to xo in the presence of noise. 


And in this run the final error values were 

||x(100, 000) - zoHa = 0.2706 and ||/z(100, 000) - fioh = 0.2842, 
with these values after half of the total runtime being 

||x(50, 000) - soils = 0.7658 and ||/x(50, 000) - p 0 \\ 2 = 0.2225. 

These values confirm what can be seen visually in Figures [T] and [2] shorter runtimes than 
100, 000 timesteps can be used while ending at a reasonable distance from zo and, in light of the 
large variances of some noises present, reasonable numbers of iterations produce an approach 
toward z 0 that would be useful in many applications. 


C. Simulation of (e, < 5 ) -differential privacy 

In this case the adjacency parameter was chosen to be B > 1. The values e = In 2 and 5 = 0.01 
were used for all systems, giving k(S, e) = 3.559. Using this privacy policy, the distribution and 
variance of each noisy signal were computed and are listed in Table |TTl] 


The distribution for w g was jV(0 6x i, 4.073 • 10 4 / 6x6 ) with variance 4.073 • 10 4 . In Table III 


we record the distribution of each entry of the matrices Wi, i E I, with the understanding that 
each Wi has i.i.d. entries. 

Using this problem formulation, Algorithm [2] was run for 100,000 iterations and the values of 
|| x(k) — 1 |2 and || p(k) — p 0 \\ 2 for 1 < k < 100, 000 are plotted in Figures [3] and [4j respectively. 






Noise 

Distribution 

Variance 

Wi 

7^(0,101.3) 

101.3 

w 2 

7V(0,50.66) 

50.66 

w 3 

7V(0,50.66) 

50.66 

W4 

7V(0,50.66) 

50.66 

w 5 

7V(0,50.66) 

50.66 

w 6 

7V(0,101.3) 

101.3 

W7 

7V(0,50.66) 

50.66 

w 8 

7V(0,101.3) 

101.3 

W 9 

7V(0,50.66) 

50.66 

Who 

7V(0,50.66) 

50.66 


TABLE III 

Noisy signals and their distributions for (e, 5 )-differential privacy 


In Figures [3] and [4] we see a similar trend to Figures [T] and [2j steady, monotone decrease in the 
primal error, and general decreases in the dual error with noticeable oscillations present. 

The initial error values for this run were 

||x(0) - x 0 || 2 = 13.19 and ||/x(0) - /i 0 || 2 = 2.169. 

The final error values here were 

||x(100, 000) - aj 0 || 2 = 1.1965 and ||/x(100,000) ^ 0 || 2 = 0.7413, 

while after half of the total timesteps taken these values were 

||x(50, 000) — ajolla = 1-7857 and ||/i(50, 000) - /r 0 || 2 = 0.2500, 

indicating a rapid initial descent towards zo and close proximity to it thereafter. 

Both simulation examples show a rapid decrease in the distance from z(k) to z 0 . Such a rapid 
decrease lends itself to use of this algorithm in practical applications because it allows for useful 
improvements to be made in the value of / in a reasonable runtime while respecting the set 
and functional constraints of the problem. The theoretical and simulation results presented here 
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Fig. 4. The values of ||/u(fc) — poll 2 for k = 1,..., 100, 000 under (e, <5)-differential privacy with Algorithm [5] The initial 
approach toward po and oscillations in distance beyond that point indicate numerical convergence to po when noise is added 
for differential privacy. 


demonstrate the utility of the iterative Tikhonov regularization, even in the presence of noise 
with large variance. This robustness is further supported by the simulation results in 081 and 
demonstrates that in a practical setting strong, quantifiable guarantees of privacy can be achieved 
while providing useful convergence guarantees in the optimization problem of interest. Critical 
to the success of these numerical results is all noise being zero mean, and it is a feature of 
differential privacy that zero mean noise is effective at protecting sensitive information. 

VI. Conclusion 

A differentially private optimization algorithm for teams of many agents coordinated by a cen¬ 
tral cloud computer was presented. This problem was treated as a stochastic variational inequality 
and solved using a Tikhonov-regularized Goldstein-Levitin-Polyak iteration. Its convergence was 
shown for both e- and (e, 5)-differential privacy and numerical convergence of the algorithm was 
shown in simulation, demonstrating the ability to arrive at a collective decision while maintaining 
privacy for the users involved in making it. 
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